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Abstract 

We address the problem of optimally partitioning the modules of 
chain- or tree-like tasks over chain-structured or host-satellite multiple 
computer systems. This important class of problems includes many 
signal processing and industrial control applications. Prior research 
has resulted in a succession of faster exact and approximate algorithms 
for these problems. 

We describe polynomial exact and approximate algorithms for this 
class that are better than any of the previously reported algorithms. 
Our approach is based on a preprocessing step that condenses the 
given chain or tree structured task into a monotonic chain or tree. 
The partitioning of this monotonic task can then be carried out using 
fast search techniques. 
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1 Introduction 


The problem of assigning the constituent parts of a large parallel application 
onto the processors of a multiple computer system is one of the key issues 
in parallel processing. While the general form of this problem has eluded 
efficient solution [1, 3] there has been considerable success for problems with 
constrained structure. The mapping of problems with chain- or tree-like 
structure on multiple computer systems with chain-like interconnection or on 
host-satellite systems was shown to have exact polynomial time solutions by 
Bokhari [2]. Iqbal [6] subsequently developed faster approximate algorithms 
for this class of problems. These fully polynomial algorithms were faster but 
provided solutions only to a desired degree of accuracy e. Nicol & O Hallaron 
[8] improved Bokhari’s exact algorithms and developed new algorithms that 
were still faster but operated under the assumption of bounded execution and 
communication costs. In the present paper we describe a new ‘condensation’ 
approach that permits exact polynomial time solutions to these problems 
that are faster than any of the previously reported exact or approximate 
algorithms. Our approach involves a preprocessing step on the given chain 
or tree that makes it monotonic and permits a very fast exact solution. These 
new algorithms are straightforward to implement and provide the exactness 
of [2], the speed of [8], are no more involved than those of [6], and make no 
assumptions about magnitudes of costs. 

Chain-structured computations form an important class that includes 
many signal processing applications. Such computations are conveniently 
carried out on chain structured machines in parallel or pipelined mode [4, 5]. 
Tree-structured computations also arise in signal processing as well as in in- 
dustrial control applications [2j. In the latter case sensor inputs from the 
shop floor are processed up the nodes of a tree to a central control node, and 
control signals travel in the reverse direction. Such tree-structured compu- 
tations can be partitioned over the processors of a host-satellite system to 
improve response time. 

In Section 2 of this paper we describe the key theoretical results related 
to our condensation approach. We show how monotonic chains are obtained 
and discuss their properties. The concept of monotonicity permits us to 
develop improved algorithms for partitioning chain structured programs on 
chain connected processors. We describe approximate and exact algorithms 
that utilize the condensation approach in Section 3. Section 4 addresses 
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the problem of assigning multiple chain-structured computations on a host- 
satellite system and develops improved approximate and exact algorithms 
for these. In Section 5 we describe an improved exact algorithm for parti- 
tioning a tree structured computation over a host-satellite system. Section 6 
summarizes the results of this paper. 


2 The Partitioning Problem 

In this Section we will define our assignment problem and discuss the prop- 
erties of chains. We will show how a given chain can be transformed into a 
monotonic chain and how this transformation permits faster solutions to the 
assignment problem. 

2.1 Statement of Problem 

We will assume that we are given a chain-structured program of m modules 
(numbered 1 to m.) and that this is to be partitioned over a chain structured 
processor with n < m nodes (numbered 1 to 7 i). With each module i is 
associated an execution cost W{ and a communication cost Cj. uij is the time 
required to execute that module on any processor (we assume a homogeneous 
system), while Cj is the time required for module i to communicate with 
module i + 1. 

We will work under the assumption that each processor has a contigu- 
ous subchain of modules assigned to it. Thus the chain is partitioned into 
subchains such that modules i and i -\- 1 reside on the same or on adjacent 
processors. We call this the contiguity constraint. When a subchain is as- 
signed to a processor, the load on that processor is the sum of the execution 
costs Wi plus the communication costs for the two modules at the ends of the 
subchain. The time required for the entire system to complete the task is 
equal to the time taken by the most heavily loaded processor which is equiva- 
lent to the weight of the heaviest subchain. The next subsection summarizes 
these definitions. 

The problem of finding the partitioning that minimizes the weight of 
the heaviest subchain was originally solved by Bokhari [2] in 0(m 3 n) time. 
This is an exact algorithm that makes no assumptions about the magnitudes 
of the execution or communication costs. This algorithm was improved to 
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0(m 2 n ) by Nicol & O’Hallaron [8]*. Iqbal [6] developed a fully polynomial 
approximation algorithm that obtained an assignment optimal to within a 
factor of e in time 0(mnlog(W/e)), where W is the sum of all execution 
costs. Nicol & O’Hallaron [8] reported a carefully developed algorithm that 
could solve this problem in 0{mn log m) time under the assumption that 
the WiS and the c t s are bounded. One of the major results m the present 
paper is an algorithm that solves this problem in 0{mn log m) time with no 
assumptions about the magnitudes of costs. We will also describe a faster 


approximation algorithm. 

Since we will be discussing the partitioning of a chain of modules over 
a chain of homogeneous processors, the problem is equivalent to partition- 
ing chains into subchains. We will consider subchains and processors to be 
synonymous in the following discussion. 


2.2 Definitions 

Wi execution time of module i. 

d communication time between modules i and i + 1. 

We assume that c 0 {c m ) is the time required for module l(m) to com- 
municate with the outside world. 

W load on a processor if all m modules are assigned to it. 

W = i w t + c 0 + c m 

Q pa t load on processor p if subchain s ■ • ■ t is assigned to it. 

O 8 t = j tl^i ■}“ *f Cj_ i* 

This is synonymous with the weight of subchain p. 

r(p) a vector of length n that specifies the partition. _ 

Processor p has the subchain r(p - 1) + 1 ■ • • r(p) assigned to it, with 

T (0)=0. 

bottleneck processor/subchain: for a given r(p) the processor/subchain with 
weight max p {n Pi T(p-i)+i,r(p)} 

•These two algorithms permit heterogeneous processors. The remaining algorithms 
assume homogeneous processors. 
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u;(r(p)) weight of a partition = the weight of its bottleneck processor. This 
is denoted by u) when no confusion is likely. 

The optimal partition is the r(p) for which the weight u/(r(p)) is minimum. 


2.3 The Condensation Theorem 

Theorem 1. Consider a chain that has a partition of weight u, and in which 
there exists an edge c t such that either c t > u, e+1 + c t+1 or c t > u, t + *_ lf or 

oth. Then this chain will continue to have a partition of weight < oj if we 
merge modules t and t + 1. ~ 


In glven P artltlon weight u, modules t and t + 1 must belong 
to different subchains, otherwise the proof is trivial. We assume that modules 

* V , 3 °"? ‘° su ^ chain P modules ( + 1 ■ ■ • u belong to subchain 

p + 1 (see Figure 1). The weights of these subchains are 

t 

ftp.'.t ~ 22 Wi + C,_! + C ( 

\=8 

U 

Mp+l,t+l,u = 22 W i + c t + C„. 

t'=t+l 

Let u, merge module, t and ( + 1 into one module. The condensed module 
can be assigned either to subchain p or to subchain p+ 1. If it is assigned to 
subchain p, the weights of the two subchains become 

= Eli] Wi + c._! + c f+1 = n p i( + w t+1 - Ct + c t+1 
fip+M+ 2 ,„ = Ei = t +2 w i + c t +: + c u =fi pfM+ltU - u , t+1 - Ct + Ct+1 . 

^ c t ^ Wt + 1 + Ct+i we obtain 




^p+l,t+ 2 , U < fi p+ l >t+1(U . 


If the condensed module is assigned to subchain p+ 1, the weights of the 
two subchains become 


E l=a W , ; + C J _ 1 -f~ C{_ 1 — Q p t t — Wt — Ct + Ct_i 

^p+l,i,ti — Ei=i Wi + C t -i + C u = ^p+l,t+l,u + W t — C t + C t -\. 
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flp,j,t ^p+l,t+l,u 



Figure 1: A chain of m modules mapped onto a chain of tl processors. The 
u^s are execution costs; c<s are communication costs. Modules 3 ■ ■ ■ t are 
assigned to processor p; modules t + 1 ■ • - u are assigned to processor p -f- 1. 


If c t > w t + Ct_! we obtain 

I^p+l,t,U ^ I^p+l,t+l,u- 

Our condensation disturbs only subchains p and p + 1, all other subchains 
remain undisturbed. The pairs of inequalities obtained above assure us that 
there will always be one case in which the weights of these condensed sub- 
chains is less than the weights of the original uncondensed subchains. Thus 
the entire condensed chain will have a partition with weight < u.O 

2.4 Monotonic Chains 

A given chain of m modules can be transformed into a chain of m! < m 
modules by applying the procedure condense. This procedure looks at all 
edges in the chain and merges modules t and t + 1 if c t > w t +i + c t + i or 
Ct > Wi + c t _i, or both. From Theorem 1, we know that if a given chain has 
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O^O-^O^O^O 

v J v y 
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Figure 2: Top. A 10 module chain and the plot of its fi llt which is not 
monotone. Bottom. The 10 module chain transformed into a 7 module chain 
by applying procedure condense. The plot of the condensed chain’s is 
monotonic. 





a partition of weight w the corresponding condensed chain will also have a 
partition of weight < u>. This procedure obviously takes 0(m) time. 

Theorem 2. In a chain that has been transformed by applying procedure 
condense, fi Pl *,t < fi P ,M+ i> f° r all 1 < p < n, 1 < 5, < tti- 

Proof. By contradiction. Suppose > ft P ,.,t+i- Then 

t 

i=s 

Ct > 

But this is impossible since condense removes all edges that satisfy (1). □ 

An important consequence of Theorem 2 is the fact that all condensed 
chains are monotonic : the weight of a subchain cannot decrease as more 
nodes are added to it. This property is crucial to the material that follows. 

2.5 Probing Function 

Once a given chain has been transformed into a monotonic chain, we can 
use the function probe(m, n, w) on it. This procedure returns true if it is 
possible to partition the given chain of m modules into n subchains each with 
weight < w , and false otherwise. 

function probe(processors[l • • • n], modules[l • • • m], u;):boolean; 
begin 

1. 3 := l;t := l;p := 1; 

2. while p < n do 

begin 

3. attempt to find a t > s such that 

(fl Pi ., t < w) and ((n p ,.,t+i > w) or (t = m)) 

4. if t = m then return(true); 

5. Assign subchain s ■ ■ • t to processor p; 

6. s:=t+l; p:=p+l; 
end; 

7. return( false); 
end; 


t+i 


^2 Wt + c t +i + C,_ 1 , and thus 

i— s 

W t +1 + C t +l. (^) 
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The search at step 3 can be carried out by simply incrementing t, in which 
case this procedure takes time proportional to m, the number of modules in 
the condensed chain. However, the monotonicity of the condensed chain 
permits us to use a binary search over the remaining modules at step 3. This 
is because once we have computed fi ltM for all t, there is no need to compute 
any other since ft p , 4i < = — c 0 — Wi + c,_ x (this is illustrated in 

Figure 3). Thus we need to compute once for all t, and compute w t 
once for all s. These computations take 0(m ) time each and subsequently let 
us execute probe in 0(n log m) time. Thus each execution of probe takes 
0(min(m, n log m)) time, depending on the search strategy. 

This is a greedy algorithm and the partition that it returns is called a 
greedy partition. In [7] a similar probing function was applied to chains with 
zero communication costs. 

Theorem S. If it is possible to partition a chain with m modules into n 
subchains, each with weight w, the function probe(m, n, uj) will always find 
that or a partition of weight < u>. 

Proof. Similar to the proof given in [7]. Omitted for brevity. □ 


3 Partitioning Chains on Chains 

We now show how the results of the preceding Section can be used to obtain 
faster algorithms for partitioning chains on chains. We will discuss first an 
approximation algorithm that supplies an answer to within any specified 
degree e of accuracy. We will then go on to develop a fast exact algorithm. 

3.1 Approximate Assignment 

Suppose we wish to solve the problem of partitioning chains on chains ap- 
proximately. That is, we wish to partition a chain of m modules into n 
subchains such that the weight of the heaviest subchain is within e of the 
optimal partition. We proceed by first applying procedure condense on the 
given chain. An upper bound on the weight of the optimal partition is W, the 
cost of executing all modules on one processor. A lower bound is 0. We can 
divide this interval into no more than W j e subintervals and conduct a binary 
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t 

Figure The plots of ^2 x ,1 ,t a ^d spaced exactly Eimi c o 

c a _ i apart. Thus a binary search on ^ 2 , 2 ,* can be carried out on by 

compensating for the offset w\ — C \ . Some numbers have been omitted to 
avoid congestion. 


search using probe over this range. A binary search is permissible since the 
chain has been condensed into a monotonic chain. Thus the time required is 
0(min(m,nlogm)log(W/e)). This is better than the best previously known 
approximation algorithm [6] which is Oijnn log(VF/e)). 

3.2 A Simple Exact Algorithm 

Once we have condensed our chain of modules into a monotonic chain, we 
can compute the 0(m 2 ) values of 1 5 < m, 1 < t < tn (we as- 

sume that the condensed chain has m modules). We can arrange these 
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values in a master sorted list without having to sort explicitly. This is be- 
cause each fi 1|M is monotonic for a fixed s. We can thus merge each fl into 
the master list in O(m 2 logm) time. Once this list has been generated, we 
can binary search over it using probe and find the optimal assignment in 
0(min(m, n log m) log m) time. Assuming m > n, the total time is masked 
by the time to create the master list, which is 0(ra 2 logm). 

3.3 Improved Exact Algorithm 

Observe first that, since our chain connected system is homogeneous, Q P}B t — 
^ q ,»,t for all p, q. Thus we can always fix module 1 to processor 1 and consider 
only the m(m - l)/2 values of 1 < s < m, 1 < * < m. 

The number of probes required to find the optimal bottleneck subchain 
can be reduced by carefully analyzing the relationships between fis. These 
are shown by the lattice of Figure 4 in which each node represents an f2 and 
a directed edge from node p to node q implies that p > q. Monotonicity 
of the chain ensures that This accounts for the horizontal 

edges. We can also observe that - 0 4+1|4+1>t -- c,_ a + w, - c„ which is 
positive for condensed chains. This accounts for the vertical edges. 

We can use binary search with probe over the median row s' of this lattice 
to find the smallest t' for which probe(P 4 / i4 - it /) is true. Once this has been 
done and the value of recorded, we can eliminate from consideration 

all with s > s' and t < t' since probe(fi JiM ) is guaranteed to be false 
in this range. We can also eliminate all with s < s' and t > t' , 

since is the smallest feasible value in this region. Figure 4 illustrates 

these regions. This process of elimination is continued recursively on the two 
remaining subregions. This 2-dimensional search technique is due to Nicol & 
O’Hallaron [8] who show that it takes no more than 4m probes to find the 
optimal value. 

Since our probe takes 0(min(m, n log m)), we have an overall complexity 
of 0(mn log m). This is the same as Nicol & O’Hallaron’s algorithm [8], which 
assumes bounded execution and communication costs. Our algorithm makes 
no assumptions about the magnitudes of costs. 
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Figure 4: Illustration of 2-dimensional binary search over D.(s,s, t). A search 
over row 3 yields fi 3 , 3,5 as the smallest for which probe returns true. We 
can now eliminate from consideration all fls in the dotted region, as probe 
can never be true for these. We can also eliminate the dashed region, since 
ft 3 3 B is the smallest from among these fts. 
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4 Chains on Host- Satellite Systems 

We now address the problem of partitioning multiple chains on a host-satellite 
system. In this case we assume that we have a large, powerful host computer 
connected to many smaller satellite machines (Figure 5). Each satellite re- 
ceives a data stream from a real time environment, performs a chain of com- 
putations on it, and forwards the results to the central host. It is possible 
to partition each satellite’s chain so that some of its modules reside on the 
host and take advantage of the host’s greater computational power. We are 
interested in minimizing the time required for all satellites to complete one 
iteration of their respective tasks. If too much load is assigned to the host, 
then the time to complete one iteration of all tasks will increase to an in- 
tolerable extent. On the other hand if all chains reside on their respective 
satellites then the power of the host is wasted. The problem is to find a 
balance between the two extremes, i.e. a partitioning of the several chains 
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that minimizes the maximum of (1) the most heavily loaded satellite and (2) 
the total load on the host. As before, we assume partitions into contiguous 
subchains. In the present case, this means that each chain is divided into 
two contiguous subchains, one of which resides on the host and the other on 
the satellite. 


4.1 Definitions 

n number of satellites. 

m number of modules per chain. For simplicity, we assume that all 
chains have the same number of modules. 

e, . execution time of module i of satellite s. 

Cj, communication time between modules i and i + 1 of satellite s. We 
assume that c 0 ,,(c mia ) is the time required for module l(m) of satellite 
s to communicate with the outside world(the host). 

a, for satellite s, the ratio of compute time for a module on the satellite 
to its compute time on the host. Thus module i will take time on 
satellite s and w^Ja, on the host. 

t load on satellite s if subchain 1 • • • t is assigned to it. 

= Et=l "I” C M’ 

A, t load on host caused by modules t + 1 • • • m of chain s. 

A , it = ESt+i w d a > + c t- 

We can denote a partition of chains by the vector T\, T 2 , • • ■ T n such that 
modules 1 • • • T, of chain s are assigned to satellite s and the remaining to 
the host. The time required by this partition is 

n 

max( max A#.r.)- ( 2 ) 

l<j<n t 


13 




4.2 Condensing Chains 

The chains of our single-host multiple-satellite system can be condensed into 
monotonic chains. A complicating issue is the fact that each module has 
two execution costs (w i<a on the satellite and w it ,/a, on the host). A chain 
that is monotonic with respect to one execution cost may not necessarily be 
monotonic with respect to the other. However the probing function that we 
describe in the following subsection is concerned only with satellite weights 
and it therefore suffices to condense the chain with respect to these satellite 
weights. 

4.3 Probing Function 

We now assume that all our n chains of m modules are condensed, monotonic 
chains as discussed above. If we view a single host-satellite combination as a 
two processor system, we can apply a simple modification of function probe 
of Section 2.5 to determine if this chain can be divided into two subchains 
suet that the satellite has load < w on it and k is maximum. Since our 
chains are monotonic with respect to satellite costs, this version of probe can 
use binary search and provide an answer in O(logm) time. This function will 
return true or false and will specify k and f b.fc case the answer is true. 

It is straightforward to compute A #(fc in constant time from this information. 

Given an u we can compute if there exists a partition that puts < w 
load on each of the satellites and £?=i A 4 ,r. < total load on the host as 
follows. Apply probe (w) to each of the satellites, computing and adding 
up all A Si t,s as they are reported. If all processors answered true and if 
A a x o>, there does indeed exist a partition that puts no more than 
u load on each of the satellites and on the host. This entire ‘ensemble’ probe 
can be carried out in O(nlogm) time. 

4.4 Partitioning Algorithms 

In a problem with n chains of m modules each, there are mn possible values 
of u. We could carry out mn ‘ensemble’ probes to obtain the assignment 
that minimizes (2) in 0(mn 2 log m) time. This is an exact algorithm but is 
not an improvement over previously known exact algorithms. If we denote 
by W the time taken if all modules are assigned to the host and resolve to 
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an accuracy of e, we immediately obtain an approximation algorithm that 
takes 0(nlogmlog(W/e)) time, which is better than Iqbals 0(mn \og(W/t)) 
approximation algorithm [6]. 

However it is possible to do much better. Note that our n monotonic 
chains have m potential los each, in ascending order. These n lists can be 
merged into one sorted list in 0{rnn log n) time. We can subsequently use bi- 
nary search over this sorted list to solve our problem in 0(log(77i7i)nlog rn) = 
0(n log 2 m + n log m log n) time. This time is masked by the 0(mn log n) 
time to condense chains and to merge a/s. This time is equal to Nicol & 
0 Hallaron s 0(mn log n) algorithm, which assumes bounded execution and 
communication costs. Our algorithm makes no such assumption. 

5 Trees on Host- Satellite Systems 

We now consider the problem of partitioning a tree structured program over 
a host-satellite system. Our program is made up of a number of modules that 
can execute either on the host or on one of the satellites. As in the previous 
Section, we have a motivation to assign as many modules as possible on the 
host in order to take advantage of its greater power. However, we do not 
wish to load the host to the point that the time required for it to complete 
its portion of the task is greater than the time that would have been required 
by the satellites. 

We will assume that our partitioning is under the following constraints. 

1. The root of the tree is always assigned to the host, 

2. if a specific node is assigned to a satellite, all its children nodes are also 
assigned to the same satellite, 

3. if two nodes are assigned to a satellite, their lowest common ancestor 
is also assigned to the same satellite. 

In other words each satellite has a single maximal subtree assigned to it. An 
example of a partition that satisfies these constraints is given in Figure 7. We 
will assume that we have available as many nodes as there are satellites and 
that the optimal assignment may choose not to use some of them. This is a 
good model of many industrial process monitoring and/or control systems. 
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In such systems, external information from the shop floor is gathered by 
satellite computers and processed in a hierarchical fashion up the levels o 
tree. The root of this tree resides on a large, central host machine. Contr 
signals from the host travel in the opposite direction. Processing may 
done in a pipelined or parallel fashion. It is important to partition the tree 
between the host and the satellites such that the response time of the system 
is minimized. As in the preceding Section, this response : time 
larger of (1) the load on the most heavily loaded satellite and (2) the tota 

load on the host. 
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5.1 Definitions 


m number of modules in the tree, 
n number of satellites in a given partition. 

C * executlon of module i on a satellite. All satellites are assumed 
to be similar. 

a the ratio of compute time for a module on a satellite to its compute 
time on the host. Thus module z will take e, time on a satellite and 

the ttemt) 15SUme ‘ hl ‘ ° > ‘ ^ 

C, communication time between modules i and father(i) if i is assigned 
to a satellite and father(z) to the host. 

C(z) the set of children of node z. 

T (p) the root node of the subtree assigned to satellite p. 

T(i) the set of nodes in the subtree rooted at node z. 

Hi co ^ tribution to th e load on the host made by the assignment of the 
subtree rooted at node i to the host. 

= HjeT(i) ej/a. 

W l?f d °^ m the b ° St of a11 m modules of the program are assigned to it. 
w ~ L,= i ei/a. 

S t load on a satellite if the subtree rooted at module i is assigned to it. 

4 - > ‘ ~ e j + c,-. 

'Ht total load on the host 

«r=lt'-£ p “ =1 (*, w - CrW ). 

Our assignment is specified by the vector r(p), l<p<„ that specifies the 

root node of the subtree resident on each satellite. Given this vector, the 
weignt of an assignment is 


max( max S v , Hr)- 

l<p<n p> 


( 3 ) 
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5.2 Condensing Trees 

Theorem 4 Consider a tree that has a partition of weight w, and in which 
there exists a node / with a child g 6 C(f) such that at least one of the 
following two inequalities holds. 

Cg > Cf + ^ Ci W 

<€{T(/)-T( g)} 

Cg > e g + E C ‘ ^ 

Then this tree will continue to have a partition of weight < u if we merge 
nodes / and g. 

Proof. The given partition of weight u must assign / to the host and g 
to a satellite, otherwise the proof is trivial. When we merge / and g, the 
condensed node / + g can be assigned either to the host or to a satellite. 

If inequality (4) holds assign the condensed node to a satellite (see Figure 
8). In this case the load on the satellite before condensation was 

s 9 = E e ‘ + c *- 

i€T(g) 

After condensation it is 

Sf = E e ‘ + c f • 

*€T(/) 

The decrease is 

S g -S f = Cg- E e <~ C i- 

ie{T(J)-T(g)} 

This quantity is non-negative because of (4). 

The load on the host will decrease by at least c g + e f /a-c f which is also 

non-negative because of (4). 

If inequality (5) holds then assign the condensed node to the host. In 
this case the load on the satellite before condensation is again S g (given 
above). After condensation, part of this load will go to the load and part 
will be distributed over several additional satellites (so that there is now one 
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satellite for each child of g). Each of the new satellite loads will be at least 
e g c g nnnigc(g) c i less than the original satellite load S g . This quantity is 
non-negative because c g > Eiec(g) c^. The load on the host will increase by 
e a/ a + £»€C( 3 ) Ci and decrease by c g . The quantity c g - e g /a - Ei eC ( 3 )C, is 
non-negative because of (5) and because a > 1. 

In at least one case the loads on the satellites and on the host decrease 
or remain unchanged. Thus if there is a partition of weight cu before conden- 
sation there will be a partition of weight < u after condensation. □ 
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5.3 Monotonic Trees 

A procedure condense.tree can be derived from Theorem 4. This procedure 
goes through the tree and merges together all nodes / and g, where g is the 
child of /, which satisfy (4) or (5). A tree to which this procedure has 
been applied is called a condensed tree. Condensed trees are monotonic m a 
fashion analogous to condensed chains. 

Theorem 5. In a tree that has been transformed by applying procedure 
condense.tree, S g < S f for all f,g such that / is the father of g. 

Proof. By contradiction. Suppose S g > Sf. Then 
E e * + c * > E ei + cf 

ieT{g) ier(f) 

Cg > E e * + C / t 6 ) 

*e{T(/)-T(g)} 

But this is impossible since all /, g that satisfy (6) are eliminated by proce- 
dure condense.tree. □ 

This theorem assures us that, once a tree has been condensed, the load 
caused by a subtree cannot exceed the load caused by a containing subtree. 

5.4 Probing Function 

A probing function can now be designed to evaluate if there exists a partition 
of the condensed tree that assigns no more than u; weight to each of the 
satellites or to the host. This probing function proceeds upwards from the 
leaves of the tree and stops each time it identifies a maximal subtree that has 
weight < u/.When all such subtrees have been identified, the load on the host 
can be calculated. If this is less than u;, the function returns true. Since 
the condensed tree is monotonic, i.e. the weight of a subtree is always < the 
weight of a containing subtree, this probing function needs to look at each 
node only once and will return an answer in 0(m) time. 

5.5 Partitioning Algorithm 

There are m potential subtrees in our condensed tree. Their weights can be 
evaluated in 0(m) time and sorted in O(mlogm) time. Following this, we 
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can can carry out a binary search over this list to find the optimal value of 
oj. This takes 0 ( log m) probes each of cost O(m). The overall time for this 
algorithm is thus O(mlogm). This is better than Bokhari’s exact algorithm, 

which takes 0(m 2 log m) time and Iqbal’s approximation algorithm which is 

0{m\og{W / e)). 


6 Conclusions 

The general problem of partitioning a program over a multiple computer 
system has so far eluded an efficient solution. Prior research by Bokhari [2] 
Iqbal [6] and Nicol & O’Hallaron [8] has reported a succession of efficient 
algorithms for the restricted class of chain- or tree- structured programs. In 
the present paper we have described a condensation approach that prepro- 
cesses the given chain or tree in linear time. This condensation makes the 
chain or tree monotonic and permits fast algorithms to be used in the search 
for the optimal partition. 

For the problem of partitioning an m module chain over a chain of n pro- 
cessors, we have improved Iqbal’s 0(mn log(W/e)) approximation algorithm 
to 0{m\og{W / e)). Our exact algorithm for this problem is 0(mn log m) 
which compares with Nicol & O’Hallaron’s 0(m 2 n ) exact algorithm and 
their 0(mn log m) bounded cost algorithm. Our exact algorithm makes no 
assumptions about costs. 

When faced with the problem of partitioning n chains of m modules each 
over a host-satellite system, we have developed an 0(nlogmlog(W/e)) ap- 
proximation algorithm that is better than Iqbal’s 0(mn log(W/e)) solution. 
Our exact solution is 0{mn log n), which is equal to Nicol & O’Hallaron’s 
algorithm (which again assumes bounded costs). 

Finally, for the problem of partitioning a single tree-structured program 
over a host-satellite system, we have improved Bokhari’s 0{m 2 log m) exact 
solution to 0(m log m ). The following table summarizes this discussion. 
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Problem 

Linear Array 

Host Satellite 

wm 

II 

Bokhari 

exact 

m 3 n 


m log m 

N-O’H 

exact 

IHBQHH 

mn log m 


Iqbal 

approximate 


mn log(Wye) 

mlog(WYe) 

N-O’H 

bounded costs 

mn log m 

t i r> 

mn log n 



Improved Results 


Approximate 

mlog(W/e) 

n\ogm\og(W/ e) 

— 

Exact 

mn log m 

mn log n 

TVT 

m logm 

fVTTallarnn. 
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